Can We Get A Better Retrieval Function From Machine?

نویسندگان

  • Weiguo Fan
  • Wensi Xi
  • Edward A. Fox
  • Li Wang
چکیده

The quality of an information retrieval system heavily depends on its retrieval function, which returns a similarity measurement between the query and each document in the collection. Documents are sorted according to their similarity values with the query and those with high rank are assumed to be relevant. Okapi BM25 and their variations are very popular retrieval functions and they seem to be the default retrieval function for the IR research community; and there are many other widely used and well studied functions, for example, Pivoted TFIDF and INQUERY. Most of these retrieval functions being used today are made based on probabilistic theories and they are adjusted in real world according to different contexts and information needs. In this paper, we propose the idea that a good retrieval function can be discovered by a pure machine learning approach, without using probabilistic theories and knowledge-based techniques. Two machine learning algorithms, Support Vector Machine (SVM) and Genetic Programming (GP) are used for retrieval function discovery, and GP is found to be a more effective approach. The retrieval functions discovered by GP might be hard for human interpretation, but their performance is superior to Okapi BM25, one of the most popular functions. The new retrieval function is combined with query expansion techniques and the retrieval performance is improved significantly. Based on our observations in the empirical study, the GP function is more reliable and effective than Okapi BM25 when query expansion techniques are used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved radial basis function neural network for object image retrieval

Radial Basis Function Neural Networks (RBFNNs) have been widely used for classification and function approximation tasks. Hence, it is worthy to try improving and developing new learning algorithms for RBFNNs in order to get better results. This paper presents a new learning method for RBFNNs. An improved algorithm for center adjustment of RBFNNs and a novel algorithm for width determination ha...

متن کامل

ImageSaker: A Semantic-based Image Retrieval System Refining with Concept Model

In this demonstration, a two-level system for semantic-based image retrieval is proposed. To overcome the shortcoming of the traditional retrieval system, we present a novel method which can provide effective retrieval result in a short time. Firstly, it uses surrounding text to get a related candidate image set. Secondly, a semantic network is used to map the keyword to one of concept models w...

متن کامل

Everything Gets Better All the Time, Apart from the Amount of Data

The paper first addresses the main issues in current content-based image retrieval to conclude that the largest factors of innovations are found in the large size of the datasets, the ability to segment an image softly, the interactive specification of the user’s wish, the sharpness and invariant capabilities of features, and the machine learning of concepts. Among these everything gets better ...

متن کامل

TBM Tunneling Construction Time with Respect to Learning Phase Period and Normal Phase Period

In every tunnel boring machine (TBM) tunneling project, there is an initial low production phase so-called the Learning Phase Period (LPP), in which low utilization is experienced and the operational parameters are adjusted to match the working conditions. LPP can be crucial in scheduling and evaluating the final project time and cost, especially for short tunnels for which it may constitute a ...

متن کامل

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We propose an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more efficiently than previously proposed methods. Even though the model has two hidden layers, it can b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004